You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As several other classes can change the internal state of the RoutingNodes data structure, inefficient looping over nodes and assigned shards was necessary in the AllocationDeciders.
With larger clusters, reallocation gets too slow. In our current case, we have 5 years of data with daily indices, 6 shards per index, replication factor 1. Recalculating cluster state can take minutes, with the master sitting at 100% CPU in RoutingNodes.shardsRoutingFor( MutableShardRouting ).
The taken approach is
a) making RoutingNodes a singleton, since only one active instance should ever exist anyhow,
b) notifying RoutingNodes of changes in MutableShardRouting instances state.
This certainly is not the most elegant approach and adds complexity instead of removing it, but is what can be done without a major refactoring of allocation.
In the supplied test case execution of the final reallocation is sped up from 22 seconds on my test machine to 4.2 seconds.
As several other classes can change the internal state of the RoutingNodes data structure, inefficient looping over nodes and assigned shards was necessary in the AllocationDeciders.
With larger clusters, reallocation gets too slow. In our current case, we have 5 years of data with daily indices, 6 shards per index, replication factor 1. Recalculating cluster state can take minutes, with the master sitting at 100% CPU in RoutingNodes.shardsRoutingFor( MutableShardRouting ).
The taken approach is
a) making RoutingNodes a singleton, since only one active instance should ever exist anyhow,
b) notifying RoutingNodes of changes in MutableShardRouting instances state.
This certainly is not the most elegant approach and adds complexity instead of removing it, but is what can be done without a major refactoring of allocation.
In the supplied test case execution of the final reallocation is sped up from 22 seconds on my test machine to 4.2 seconds.
There is already a PR for this: #4257
The text was updated successfully, but these errors were encountered: